Enhanced block ciphers with data-dependent rotations

ABSTRACT

A plaintext message to be encrypted is segmented into a number of words, e.g., four words stored in registers A, B, C and D, and an integer multiplication function is applied to a subset of the words, e.g., to the two words in registers B and D. The use of such an integer multiplication greatly increases the diffusion achieved per round of encryption, allowing for higher security per round, and increased throughput. The integer multiplication function may be a quadratic function of the form ƒ(x)=x(ax+b), where a is an even integer and b is an odd integer, or other suitable function such as a higher-order polynomial. The results of the integer multiplication function are rotated by 1 g w bits, where 1 g denotes log base 2 and w is the number of bits in a given word, to generate a pair of intermediate results t and u. An exclusive-or of another word, e.g., the word in register A, and one of the intermediate results, e.g., t, is rotated by an amount determined by the other intermediate result u. Similarly, an exclusive-or of the remaining word in register D and the intermediate result u is rotated by an amount determined by the other intermediate result t. An element of a secret key array is applied to each of these rotation results, and the register contents are then transposed. This process is repeated for a designated number of rounds to generate a ciphertext message. Pre-whitening and post-whitening operations may be included to ensure that the input or output does not reveal any internal information about any encryption round. Corresponding decryption operations may be used to decrypt the ciphertext message.

FIELD OF THE INVENTION

[0001] The present invention relates generally to cryptography, and moreparticularly to block ciphers for implementing encryption and decryptionoperations in cryptographic applications.

BACKGROUND OF THE INVENTION

[0002] In a conventional block cipher cryptographic system, a plaintextmessage is encrypted using a secret key, and is transmitted in itsencrypted form. A receiver decrypts the encrypted message using the samesecret key in order to recover the plaintext message. An example of aconventional block cipher is the Data Encryption Standard (DES) cipher.DES and other conventional block ciphers are described in B. Schneier,Applied Cryptography, pp. 154-185 and 219-272, John Wiley & Sons, NewYork, 1994, which is incorporated by reference herein. An improved blockcipher utilizing data-dependent rotations is described in U.S. Pat. No.5,724,428, issued Mar. 3, 1998 in the name of inventor R. L. Rivest,which is incorporated by reference herein. This improved cipher isreferred to as RC5™, which is a trademark of RSA Data Security, Inc. ofRedwood City, Calif., the assignee of U.S. Pat. No. 5,724,428. The RC5™block cipher in an illustrative embodiment provides improved performancein part through the use of data-dependent rotations in which a givenword of an intermediate encryption result is cyclically rotated by anamount determined by low-order bits of another intermediate result.

[0003] The security of the RC5™ block cipher is analyzed in, forexample, in B. S. Kaliski Jr. and Y. L. Yin, “On Differential and LinearCryptanalysis of the RC5™ Encryption Algorithm,” in D. Coppersmith, ed.,Advances in Cryptology—Crypto '95, Vol. 963 of Lecture Notes in ComputerScience, pp. 171-184, Springer Verlag, 1995; L. R. Knudsen and W. Meier,“Improved Differential Attacks on RC5™,” in N. Koblitz, ed., Advances inCryptology—Crypto '96, Vol. 1109 of Lecture Notes in Computer Science,pp. 216-228, Springer Verlag, 1996; A. A. Selcuk, “New Results in LinearCryptanalysis of RC5™,” in S. Vaudenay, ed., Fast Software Encryption,Vol. 1372 of Lecture Notes in Computer Science, pp. 1-16, SpringerVerlag, 1998; and A. Biryukov and E. Kushelevitz, “ImprovedCryptanalysis of RC5™,” to appear in proceedings of Advances inCryptology—Eurocrypt '98, Lecture Notes in Computer Science, SpringerVerlag, 1998; all of which are incorporated by reference herein. Theseanalyses have provided a greater understanding of how the structure andoperations of RC5™ contribute to its security. Although no practicalattack on RC5™ has been found, the above-cited references describe anumber of interesting theoretical attacks.

[0004] It is therefore an object of the present invention to provide afurther improved block cipher which not only exhibits additionalsecurity by thwarting one or more of the above-noted theoreticalattacks, but also exhibits an enhanced implementability in a widevariety of cryptographic applications.

SUMMARY OF THE INVENTION

[0005] The present invention provides an improved block cipher in whichdata-dependent rotations are influenced by an additional primitiveoperation which is in the form of an integer multiplication. The use ofsuch an integer multiplication greatly increases the diffusion achievedper round of 15=encryption, allowing for higher security per round, andincreased throughput. The integer multiplication is used to computerotation amounts for data-dependent rotations, such that the rotationamounts are dependent on substantially all of the bits of a givenregister, rather than just low-order bits as in the above-describedembodiment of the RC5™ block cipher.

[0006] In an illustrative embodiment of the invention, a plaintextmessage to be encrypted is segmented into four words stored in registersA, B, C and D, and an integer multiplication function is applied to twoof the words in registers B and D. The integer multiplication functionmay be a quadratic function of the form ƒ(x)=x(ax+b), where a is an eveninteger and b is an odd integer. Other types of functions, includingpolynomials with degree greater than two, may be used in alternativeembodiments. The results of the integer multiplication function in theillustrative embodiment are rotated by 1 g w bits, where 1 g denotes logbase 2 and w is the number of bits in a given word, to generate a pairof intermediate results t and u. An exclusive-or of the contents ofanother register, e.g., A, and one of the intermediate results, e.g., t,is rotated by an amount determined by the other intermediate result u.Similarly, an exclusive-or of the contents of the remaining register Dand the intermediate result u is rotated by an amount determined by theother intermediate result t. An element of a secret key array is appliedto each of these rotate results, and the register contents are thentransposed. This process is repeated for a designated number of roundsto generate a ciphertext message. Pre-whitening and post-whiteningoperations may be included to ensure that the input or output does notreveal any internal information about any encryption round. For example,the values in registers B and D may be pre-whitened before starting thefirst round by applying elements of the secret key array to thesevalues. Similarly, the values in registers A and C may be post-whitenedafter completion of the designated number of rounds by applying elementsof the secret key array to these values. Corresponding decryptionoperations may be used to recover the original plaintext message.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007]FIGS. 1 and 2 show exemplary encryption and decryption processes,respectively, in accordance with illustrative embodiments of theinvention.

[0008]FIGS. 3 and 4 are diagrams illustrating the computations involvedin the encryption and decryption processes of FIGS. 1 and 2,respectively.

[0009]FIG. 5 shows an exemplary key generation process in accordancewith the invention.

[0010]FIG. 6 shows an illustrative system incorporating encryption anddecryption processes in accordance with the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0011] The illustrative embodiment of the invention to be describedbelow is designed to meet the requirements of the Advanced EncryptionStandard (AES) as set forth by the National Institute of Standards andTechnology (NIST). To meet the requirements of the AES, a block ciphermust handle 128-bit input and output blocks. The specified targetarchitecture and languages for the AES do not yet support 64-bitoperations in an efficient and clean manner. The illustrative embodimentto be described below uses four 32-bit registers. The invention thusexploits 32-bit operations, such as integer multiplications, that areefficiently implemented on modern processors.

[0012] The illustrative block cipher in accordance with the invention isreferred to as RC6™, which is a trademark of RSA Data Security, Inc. ofRedwood City, Calif., assignee of the present application.

[0013] Like the above-described RC5™ block cipher, the RC6™ block cipherin accordance with the invention may be viewed as a parameterized familyof encryption algorithms. A given version of RC6™ can be specified asRC6™-w/r/b, where the word size is w bits, the encryption processincludes a nonnegative number of rounds r, and b denotes the length ofthe encryption key in bytes. Of particular relevance to the AES will beversions of RC6™ with 16-, 24- and 32-byte keys. For all variants in theillustrative embodiment, RC6™-w/r/b operates on four w-bit words usingthe following six basic operations. The base-two logarithm of w will bedenoted by 1 g w. a + b integer addition modulo 2^(w) a − b integersubtraction modulo 2^(w) a ⊕ b bitwise exclusive-or of w-bit words a × binteger multiplication modulo 2^(w) a <<< b rotate the w-bit word a tothe left by the amount given by the least significant lg w bits of ba >>> b rotate the w-bit word a to the right by the amount given by theleast significant lg w bits of b

[0014] Note that in the following description of RC6™ the term “round”is in accordance with the more established DES-like idea of a round,i.e., half of the data is updated by the other half, and the two arethen swapped. Various descriptions of the RC5™ block cipher have usedthe term “half-round” to describe this type of action, such that a givenRC5™ round included two half-rounds. The present description willutilize the more established meaning of a “round.”

[0015]FIG. 1 illustrates an encryption process in accordance with theillustrative embodiment of the invention. As noted above, theillustrative embodiment uses four w-bit input registers. Theseregisters, which are designated A, B, C and D, contain an initialplaintext message to be encrypted, as well as the output ciphertextmessage at the end of encryption. The designations A, B, C and D willalso be used herein to refer to the contents of the registers. The firstbyte of plaintext or ciphertext is placed into the least-significantbyte of A; the last byte of plaintext or ciphertext is placed into themost-significant byte of D. The operation (A, B, C, D)=(B, C, D, A)denotes the 30 parallel assignment of values on the right to registerson the left. In the encryption process of FIG. 1, the user supplies akey of b bytes. Extra zero bytes are appended to the key if necessary tomake the length of the key a non-zero multiple of four bytes. From this,2r+4 w-bit words are derived which are stored in an array S[0, . . . ,2r+3]. This array is used in both encryption and decryption. Additionalaspects of the key schedule are described below in conjunction with FIG.5.

[0016] The input to the encryption process of FIG. 1 includes aplaintext message stored in registers A, B, C and D, a specified numberr of rounds, and the w-bit secret key in the form of the above-notedarray S[0, . . . , 2r+3]. The output ciphertext is stored in theregisters A, B, C and D. The steps of the encryption process are shownas pseudocode in FIG. 1, and are illustrated in a process diagram inFIG. 2. Referring to FIG. 2, the operations between the horizontaldashed lines are repeated in the for loop of the pseudocode for each ofthe r rounds. Before entering the loop, the contents of register B aresummed in operation 10 with S[0] from the secret key array, and thecontents of register D are summed in operation 12 with S[1] from thearray. These operations provide “pre-whitening” which prevents theplaintext from revealing part of the input to the first round ofencryption.

[0017] The pre-whitened value of B is supplied to an operation 14 whichgenerates a function ƒ(x)=x(2x+1), where x is the input of the function,i.e., the pre-whitened value of B in operation 14. A general function ofthe form:

ƒ(x)=x(ax +b)(mod 2^(W))

[0018] has the following properties: (i) when a is even and b is odd,ƒ(x) maps {0, 1, . . . , 2^(W)−1} onto itself, i.e., it is a permutationsuch that each input x gives a different result; and (ii) when a=2 andb=1, every bit of the input x has some effect on the high order 1 g wbits off (x) for most inputs x. The above-noted function ƒ(x)=x(2x+1) isan example of an integer multiplication function which providesproperties (i) and (ii). Other functions which provide these or similarproperties, including higher-order polynomials, could be used inalternative embodiments of the invention. One such alternative is afunction in which a is zero and b is an odd integer which varies fromround to round. It should be noted that, although the output of thegeneral function in the form given above is taken mod 2^(W), this is nota requirement of an integer multiplication function in accordance withthe invention. The term “integer multiplication function” as used hereinshould be understood to include any function involving integermultiplication, including an integer multiplication itself.

[0019] The output of operation 14, which corresponds to (B x (2B +1)),is then rotated to the left by 1 g w bits in operation 16, and theresult t is exclusive-or'd with A in operation 18. The same functionƒ(x)=x(2x+1) is applied in operation 20 to the pre-whitened value of D.The output of operation 20, which corresponds to (D×(2D+1)), is rotatedto the left by 1 g w bits in operation 22, and the result u isexclusive-or'd with C in operation 24. Operation 26 rotates the resultsof the exclusive-or of A and t to the left by an amount given by theleast significant 1 g w bits of u, and the element S[2i] of the secretkey array is added in step 28 to the result of the rotation. Operation30 rotates the results of the exclusive-or of C and u to the left by anamount given by the least significant 1 g w bits of t, and the elementS[2i+1] of the secret key array is then added in step 32 to the resultof the rotation. The operation (A, B, C, D)=(B, C, D, A) is thenapplied, such that A receives the pre-whitened value of B, register Breceives the output of operation 32, register C receives thepre-whitened value of D, and register D receives the output of operation28. As previously noted, the operations between the horizontal dashedlines in FIG. 2 are repeated for r rounds. At the completion of the rrounds, the values of A and C are subject to “post-whitening” operations34 and 36, respectively, to ensure that the resulting ciphertext willnot reveal any part of the input to the last round of encryption.Operations 34 and 36 add elements S[2r+2] and S[2r+3] to the respectiveA and D values to generate post-whitened values of A and D. Theregisters A, B, C and D then contain the encrypted ciphertextcorresponding to the original plaintext.

[0020] The corresponding steps of the decryption process are shown aspseudocode in FIG. 3 and illustrated in a process diagram in FIG. 4. Theinput to the decryption process is the ciphertext stored in the fourw-bit registers A, B, C and D, the number of rounds r, and the secretround key array S[0, . . . , 2r+3]. The output of the decryption processis the original plaintext. Operations 40 and 42 update the values of Aand C to reverse the post-whitening operations 34 and 36, respectivelyof FIG. 2. The operations between the horizontal dashed lines in FIG. 4are then repeated for each of the r rounds, and correspond to theoperations within the for loop in the pseudocode of FIG. 3. Within theloop, the operation (A, B, C, D)=(D, A, B, C) is applied, such that Areceives the value of D, register B receives the revised value of Agenerated in operation 40, register C receives the value of B, andregister D receives the revised value of C generated in operation 42.

[0021] The function ƒ(x)=x(2x+1) is applied in operation 44 to D, andthe result, which corresponds to (D×(2D+1)), is rotated to the left by 1g w bits in operation 46 to generate u. Similarly, the functionƒ(x)=x(2x+1) is applied in operation 50 to B, and the result, whichcorresponds to (B x(2B+1)), is rotated to the left by 1 g w bits inoperation 52 to generate t. The element S[2i+1] of the secret key arrayis subtracted in operation 54 from C, and the result is rotated to theright in operation 56 by the least significant 1 g w bits of t. Inoperation 58, C takes on the result of the exclusive-or of u and theoutput of the rotate operation 56. Similarly, the element S[2i] of thesecret key array is subtracted in operation 60 from A, and the result isrotated to the right in operation 62 by the least significant 1 g w bitsof u. In operation 64, A takes on the result of the exclusive-or of twith the output of the rotate operation 62. After the r rounds arecompleted, operations 66 and 68 update the values of D and B to reversethe pre-whitening operations 12 and 10, respectively, of FIG. 2.

[0022] The above-described illustrative embodiment includes at least twosignificant changes relative to the conventional RC5™ block cipher.These are the introduction of the quadratic function ƒ(x)=x(2x+1) andthe fixed rotation by 1 g w bits. The use of the quadratic function isaimed at providing a much faster rate of diffusion thereby improving thechances that simple differentials will spoil rotation amounts muchsooner than in RC5™. The quadratically transformed values of B and D areused in place of B and D as additives for the registers A and C,increasing the nonlinearity of the cipher while not losing any entropy(since the transformation is a permutation). The fixed rotation plays asimple yet important role in hindering both linear and differentialcryptanalysis.

[0023]FIG. 5 shows an example of a key schedule suitable for use ingenerating the secret key array S[0, . . . , 2r+3] used in theillustrative embodiment of FIGS. 1 through 4. The key schedule of FIG. 5is similar to that used for RC5™ and described in detail in theabove-cited U.S. Patent No. 5,724,428, but derives more words from theuser-supplied key for use during encryption and decryption. The keyschedule uses two w-bit registers A and B, along with variables i, j, vand s. The inputs to the key schedule are a user-supplied key of bbytes, and the number of rounds r. The output is the array S[0, . . . ,2r+3] of w-bit round keys. Extra zero bytes are appended to theuser-supplied key if necessary to make the length of the key a non-zeromultiple of w/8 bytes. This is then stored as a sequence of c w-bitwords L[0], . . . L[c−1], with the first byte of the key stored as thelow-order byte of L[0], etc., and L[c−1] padded with high order zerobytes if necessary. Note that if b=0, then c=1 and L[0 ] =0.

[0024] In the key generation procedure shown in FIG. 5, element S[0] isinitiated to a designated constant P_(W). In the first for loop, theelements S[i] are initialized using a constant Q_(w), and values of A,B, i and j are set to zero. The second for loop generates w-bit wordsfor the secret key array. The number of w-bit words that will begenerated for the round keys is 2r+4 and these are stored in the arrayS[0, . . . , 2r+3]. The constants P_(W) and Q_(w) in FIG. 5 may be, forexample, P₃₂=B7E15163 and Q₃₂=9E3779B9 (hexadecimal), i.e., theso-called “magic constants” used for the RC5™ key schedule in U.S. Pat.No. 5,724,428. The value of P₃₂ is derived from the binary expansion ofe−2, where e is the base of the natural logarithm function. The value ofQ₃₂ is derived from the binary expansion of φ−1, where φ is the GoldenRatio. Similar definitions for P₆₄, Q₆₄ and so on can be used forversions of RC6™ with other word sizes. It should be noted that thesevalues are to some extent arbitrary, and other values could be used inalternative embodiments of the invention. Other suitable key schedulescould also be used in place of the FIG. 5 key schedule.

[0025]FIG. 6 shows one possible implementation of the invention. Asecure communication system 100 includes a transmitter 112 and areceiver 114 which communicate over a channel 116. A plaintext messageis applied to an input 118 of the transmitter 112, and processed usingthe encryption techniques described in conjunction with FIGS. 1 and 2 togenerate an encrypted message which is transmitted over the channel 116to the receiver 114. The receiver 114 processes the encrypted messageusing the decryption techniques described in conjunction with FIGS. 3and 4 to generate the corresponding plaintext message at an output 120.In this embodiment, the encryption techniques may be implemented insoftware which is executed by a processor 130 which includes theregisters A, B, C and D previously described. Software instructions forcarrying out the encryption may be stored in a memory 132 from whichthey are retrieved and executed by the processor 130. Similarly, thedecryption techniques may be implemented in software which is executedby a processor 140 which also includes the registers A, B, C and D.Software instructions for carrying out the decryption may be stored in amemory 142 from which they are retrieved and executed by the processor140. The registers A, B, C and D need not be internal to the processors130 and 140, and in other embodiments could be, for example, part of therespective memories 132 and 142. Software implementations of theinvention are very efficient in that the invention in the illustrativeembodiment uses primitive operations, e.g., add, subtract, multiply,exclusive-or and rotate, that are very well-supported on availablemicroprocessors.

[0026] The transmitter 112 and receiver 114 may be, for example,computers or other digital data processing devices connected by a localarea network, a wide area network, an intranet, the Internet or anyother suitable network. Alternatively, the transmitter 112 may be asmart card, and the receiver 114 may be a card reader. The communicationchannel 116 in such an embodiment is a connection established betweenthe card and the reader when the card is inserted in the reader. Inthese and other embodiments of the invention, the encryption anddecryption processes may be directly implemented in the processors 130and 140 using hard-wired computation circuitry. In still otherembodiments, combinations of hardware and software may be used toimplement the encryption and decryption. The invention can also beimplemented in the form of software stored on a computer-readablemedium, such as a magnetic disk, an optical compact disk or anelectronic memory. The term “processor” as used herein should beunderstood to include a microprocessor, central processing unit (CPU),microcontroller or other processing unit of a computer, set-top box,smart card, card reader, wireless terminal, personal digital assistantor other communication device, an application-specific integratedcircuit (ASIC), field-programmable gate array (FPGA) device or othertype of hardware configured to provide one or more of the computationsdescribed in conjunction with FIGS. 1 through 4, or any other type ofdevice capable of implementing at least a portion of an encryption ordecryption operation in accordance with the invention using hardware,software, or combinations thereof. The term “memory” should beunderstood to include an electronic random access memory (RAM) or othertype of memory external to the above-defined processor, such as memory132 or 142 of FIG. 6, or a memory which is internal to the processor,such as a processor memory which includes the registers A, B, C and D inFIG. 6.

[0027] Illustrative performance measurements for the above-describedencryption and decryption processes are given in TABLE 1 below. Theperformance figures do not include key setup, and are thereforeapplicable to any key size b. The performance figures shown here for anoptimized ANSI C implementation of RC6™-32/20/b were obtained using thecompiler in Borland C++Development Suite 5.0 as specified in the AESsubmission requirements. Performance was measured on a 266 MHz PentiumII computer with 32 MBytes of RAM running Windows 95. To improve theprecision of the timing measurements, maskable interrupts on theprocessor were disabled while the timing tests were executed. Thefigures shown for an assembly language implementation of RC6™-32/20/16were obtained on the same computer under similar conditions. Theperformance figures given for an optimized Java implementation of RC6198-32/20/b were measured on a 180 MHz Pentium II computer with 64 Mbytesof RAM running Windows NT 4.0. This implementation was compiled onJavasoft's JDK 1.1.6 compiler, and the performance of the resulting bytecode was measured both on Javasoft's JDK 1.1.6 interpreter (with JITcompilation disabled) and on Symantec Corporation's Java! JustInTimeCompiler Version 210.054 for JDK 1.1.2. The figures shown have beenscaled to 200 MHz, and it is expected that the AES-specified referenceplatform will produce comparable, or slightly improved, figures. TABLE 2shows, for purposes of comparison, corresponding figures forRC5™-32/16/16 for the ANSI C, Java (JIT) and assembly implementations,generated using similar methodologies. The figures in TABLES 1 and 2 areaverages generated over ten executions of the described computations.TABLE 1 Performance Figures for RC6 ™-32/20/b Blocks/ MBytes/ Cycles/sec at sec at Technique Block 200 MHz 200 MHz ANSI C RC6 ™-32/20/bencrypt   616 325,000 5.19 ANSI C RC6 ™-32/20/b decrypt   566 353,0005.65 JAVA (JDK) RC6 ™-32/20/b encrypt 16,200  12,300 0.197 JAVA (JDK)RC6 ™-32/20/b decrypt 16,500  12,100 0.194 JAVA (JIT) RC6 ™-32/20/bencrypt  1,010 197,000 3.15 JAVA (JIT) RC6 ™-32/20/b decrypt   955209,000 3.35 Assembly RC6 ™-32/20/b encrypt   254 787,000 12.6 AssemblyRC6 ™-32/20/b decrypt   254 788,000 12.6

[0028] TABLE 2 Performance Figures for RC5 ™-32/16/16 Blocks/ MBytes/Cycles/ sec at sec at Technique Block 200 MHz 200 MHz ANSI CRC5 ™-32/16/16 encrypt   328   609,756 4.9 JAVA (JIT) RC5 ™-32/16/16encrypt 1,143   174,978 1.4 Assembly RC5 ™-32/16/16 encrypt   1481,351,351 10.8 

[0029] It can be seen that the RC6™-32/20/b encryption process providesgreater throughput in Mbits/sec at 200 MHz than the correspondingRC5™-32/16/16 encryption process, for each of the three exemplarysoftware implementations of TABLE 2. As noted above, the encryptiontimes given in TABLES 1 and 2 do not include key setup, and areindependent of the length of the user-supplied key. It is expected thatthe key setup required for both RC6™-32/20/b and RC5™-32/16/b will beapproximately the same. Timings in the ANSI C case were obtained byencrypting or decrypting a single 3,000-block piece of data. The timingsin Java and assembly were obtained by encrypting or decrypting a singleblock 10,000 times. Faster implementations may well be possible.

[0030] Estimates will now be given for the performance of RC6™-32/20/16on 8-bit platforms such as those that may be found in smart cards andother similar devices. In particular, estimates will be considered forthe Intel MCS-51 microcontroller family. It is expected that theestimates can be considered to hold for other types of processors, suchas the Philips 80C51 family, that have similar instruction sets andtimings. A given round of RC6™-32/20/16 encryption includes sixadditions, two exclusive-ors, two squarings, two left-rotates by 1 g32=5 bits, and two left-rotates by a variable quantity r. Note that thiscounts (B×(2B+1))=2B²+B as a squaring and two additions. These basicoperations can be implemented on an 8-bit processor in the followingmanner, ignoring addressing instructions:

[0031] 1. A 32-bit addition can be computed using four 8-bit additionswith carry (ADDC).

[0032] 2. A 32-bit exclusive-or can be computed using four 8-bitexclusive-ors (XRL)

[0033] 3. A 32-bit squaring can be computed using six 8-bit by 8-bitmultiplications (MUL) and eleven ADDCs. Note that six multiplicationsare enough since we only need the lower 32 bits of the 64-bit product.

[0034] 4. Rotating a 32-bit word left by five can be computed byrotating the word right by one bit position three times and thenpermuting the four bytes. Note that rotating the word right by one bitposition can be done using four byte rotations with carry (RRC).

[0035] 5. Rotating a 32-bit word left by r can be computed by rotatingthe word left or right by one bit position r′ times (r′≦4, with averagetwo) and then permuting the four bytes appropriately. The five bits of rare used to determine r′ and the permutation which can be controlledusing jumps (JB).

[0036] 6. Most instructions take one cycle except MUL which takes fourcycles and JB which takes two cycles.

[0037] Using the above observations, the total number of processor clockcycles needed to implement one round of RC6™-32/20/16 on an 8-bitmicrocontroller or other similar platform is summarized in TABLE 3below. TABLE 3 Number of Cycles for One Round of RC6 ™-32/20/16 on 8-bitPlatform Cycles/ Contributing Operation Instructions Operation Cyclesadd 4 ADDC  4  4 × 6 = 24 exclusive-or 4 XRL  4  4 × 2 = 8  squaring 6MUL, 11 ADDC 35 35 × 2 = 70 rotate left by 5 12 RRC 12 12 × 2 = 24rotate left by r 8 RRC or RLC, 8 JB 24 24 × 2 = 48 Total 174

[0038] Taking conservative account of the addressing instructions, thepre-whitening, post-whitening and any additional overheads, we estimatethat encrypting one block of data with RC6™-32/20/16 requires about(174×20)×4=13,920 cycles. Assuming that a single cycle on the IntelMCS-51 microcontroller takes one microsecond, an estimate for theencryption speed of RC6™-32/20/16 on this particular processor is about(1,000,000/13,920)×1289.2 Kbits/second. As for the key setup, thedominant loop in the FIG. 5 process is the second for loop. For b=16,24, 32 and r=20, the number of iterations in this loop is v=max {20×2+4,b/4}=132, which is independent of b. Each iteration in the second forloop uses four 32-bit additions, one rotate to the left by three, andone variable rotate to the left by r. In addition, there are some 8-bitoperations which will be included as overheads. Following an analysissimilar to that given above for the encryption process, the totalestimated number of cycles for each iteration, ignoring addressinginstructions, is 52. Again, making a conservative estimate for theadditional overheads, we estimate about (52×132)×4=27,456 cycles to setup a 128-, 192-or 256-bit key, which will require about 27 millisecondson an Intel MCS-51 microcontroller.

[0039] An estimate of hardware requirements for a custom or semi-customhardware implementation of the invention will now be provided. The mostrelevant parameters are the silicon area, speed and power consumption ofa 32×32 integer multiplication. We estimate that this multiplication mayrequire 120×100 microns (0.012 mm²) in area with a standard 0.25 micronCMOS process, about three ns for each multiply operation, and a powerconsumption of about five milliwatts. We conservatively estimate that a32-bit variable rotate, i.e., a “barrel shifter,” would take about halfof the area of the multiplier (0.006 mm²) and one ns for each operation.Also, we estimate that a 32-bit full adder would take one-quarter of thefull multiplier area and around one ns. In addition, the functionƒ(x)=x(2x+1)(mod 2^(W)) can be computed by using only a multiplier thatreturns the bottom 32 bits of the 64-bit product rather thanimplementing a full 32×32 multiplier. We estimate that such a “partial”multiplier would take around 60% of the area of the full multiplier anda computation time of about three ns. Estimating an area of zero andcomputation time of zero for a 32-bit exclusive-or, and an area of 0.003mm² and a computation time of one ns for a 32-bit carry-propagate add,the total required area is about 0.016 mm² and the total requiredcomputation time is 5 ns. For an efficient implementation, two such setsof circuitry might be included on one chip. As a result, the total areawould be about 0.032 mm² for those parts directly relevant toRC6™-32/20/16, with an additional 0.018 mm² for control, input/outputand other overhead operations. The total computational area required inthis exemplary hardware implementation is therefore on the order of 0.05mm². Assuming that power consumption is proportional to area, we have atotal power budget of about 21 milliwatts. With 20 rounds per block, wehave a total encryption time of approximately 5×20=100 ns for eachblock, giving an estimated data rate of about 1.3 Gbits/second. We wouldexpect the decryption time to be similar to that required forencryption, and for both encryption and decryption time to beindependent of the length of the user-supplied key. It should be notedthat the above estimates are somewhat crude but also conservative.Savings might be possible by, for example, using only a multiplier thatreturns the bottom 32 bits rather than implementing a full 32×32 bitmultiplier, or by implementing a circuit for squaring. It would also bepossible to “unwind” the main encryption loop 20 times in some modes ofuse, which would allow for greatly improved performance at the cost ofadditional area and power consumption.

[0040] In terms of security, the best attack on RC6™ appears to be anexhaustive search for the user-supplied encryption key. The datarequirements to mount more sophisticated attacks, such as differentialand linear cryptanalysis, can be shown to exceed the available data. Inaddition, there are no known examples of so-called “weak” keys.

[0041] It should again be emphasized that the encryption and decryptiontechniques described herein are exemplary and should not be construed aslimiting the present invention to any particular embodiment or group ofembodiments. Alternative embodiments may use functions other than theexemplary quadratic described above, including polynomial functions withdegree greater than two. In addition, the output of the function neednot be taken mod 2^(w). Moreover, although illustrated in an embodimentutilizing 32-bit words and a corresponding block size of 128 bits, theinvention can be readily extended to other block sizes as required. Forexample, the invention could be configured with a 64-bit word size and acorresponding block size of 256 bits to take advantage of theperformance offered by the next generation of system architectures. Inaddition, the illustrative embodiment of FIGS. 1 through 4 allows one toexploit a certain degree of parallelism in the encryption and decryptionroutines. For example, the computation of t and u at each round can beperformed in parallel as can the updates of values such as A and C. Itcan therefore be expected that embodiments of the invention will showimproved throughput as processors move to include an increasing amountof internal parallelism. These and numerous other alternativeembodiments within the scope of the appended claims will be readilyapparent to those of ordinary skill in the art.

What is claimed is:
 1. A method of encrypting a plaintext message,comprising the steps of: (a) segmenting the plaintext message into aplurality of words; (b) applying an integer multiplication function toat least one of the words; (c) rotating a value which is based on theresult of the applying step (b) by a first number of bits; (d) rotatinga value which is based on the result of the rotating step (c) by asecond number of bits derived from another one of the words; (e)applying a secret key to a value which is based on the result of step(d); and (f) repeating steps (b), (c), (d) and (e) for a designatednumber of rounds.
 2. The method of claim 1 wherein at least one of thevalues which are based on the result of the applying step (b), therotating step (c) and the rotating step (d) is the corresponding resultitself.
 3. The method of claim 1 wherein the integer multiplicationfunction is a quadratic function of the form ƒ(x)=x(ax+b), where a and bare integers.
 4. The method of claim 3 wherein a is an even integer andb is an odd integer.
 5. The method of claim 3 wherein a is zero and b isan odd integer which varies from round to round.
 6. The method of claim3 wherein the integer multiplication function is a quadratic function ofthe form ƒ(x)=x(ax+b)(mod 2^(w)), where w is the number of bits in agiven one of the words.
 7. The method of claim 1 wherein the rotatingstep (c) includes rotating a result of the applying step (b) by apredetermined number of bits given by 1 g w, where 1 g denotes log base2 and w is the number of bits in a given one of the words.
 8. The methodof claim 1 wherein step (a) includes segmenting the plaintext messageinto four words, step (b) includes applying the integer multiplicationfunction to two of the four words, and step (c) includes rotating eachof the two results of step (b) by a predetermined number of bits togenerate two corresponding intermediate results.
 9. The method of claim8 wherein steps (d) and (e) for a given one of the two words subject tosteps (b) and (c) include the steps of: (i) computing an exclusive-or ofone of the other words and one of the two intermediate results; (ii)rotating the result of step (i) by an amount given by the otherintermediate result; and (iii) applying an element of a secret key arrayto the result of step (ii).
 10. The method of claim 1 further includingthe steps of storing the plurality of words in a corresponding pluralityof registers as part of segmenting step (a), and transposing thecontents of the registers after performing the applying step (e). 11.The method of claim 1 further including the steps of pre-whitening atleast a subset of the plurality of words before performing step (a) byapplying an element of a secret key array to the subset.
 12. The methodof claim 1 further including the steps of post-whitening at least asubset of the plurality of words after performing step (f) by applyingan element of a secret key array to the subset.
 13. An apparatus forencrypting a plaintext message, comprising: a memory for storing atleast a portion of a secret key; and a processor associated with thememory, wherein the processor is operative: (a) to segment the plaintextmessage into a plurality of words; (b) to apply an integermultiplication function to at least one of the words; (c) to rotate avalue which is based on the result of operation (b) by a first number ofbits; (d) to rotate a value which is based on the result of operation(c) by a second number of bits derived from another one of the words;(e) to apply the portion of the secret key to a value which is based onthe result of operation (d); and (f) to repeat operations (b), (c), (d)and (e) for a designated number of rounds.
 14. The apparatus of claim 13wherein at least one of the values which are based on the result of theapply operation (b), the rotate operation (c) and the rotate operation(d) is the corresponding result itself.
 15. The apparatus of claim 13wherein the integer multiplication function is a quadratic function ofthe form ƒ(x)=x(ax+b), where a and b are integers.
 16. The apparatus ofclaim 15 wherein a is an even integer and b is an odd integer.
 17. Theapparatus of claim 15 wherein a is zero and b is an odd integer whichvaries from one of the rounds to another of the rounds.
 18. Theapparatus of claim 15 wherein the integer multiplication function is aquadratic function of the form ƒ(x)=x(ax+b)(mod 2^(w)), where w is thenumber of bits in a given one of the words.
 19. The apparatus of claim13 wherein operation (c) includes rotating a result of operation (b) bya predetermined number of bits given by 1 g w, where 1 g denotes logbase 2 and w is the number of bits in a given one of the words.
 20. Theapparatus of claim 13 wherein operation (a) includes segmenting theplaintext message into four words, operation (b) includes applying theinteger multiplication function to two of the four words, and operation(c) includes rotating each of the two results of operation (b) by apredetermined number of bits to generate two corresponding intermediateresults.
 21. The apparatus of claim 20 wherein the processor is furtheroperative to implement operations (d) and (e) for a given one of the twowords subject to operations (b) and (c) by: (i) computing anexclusive-or of one of the other words and one of the two intermediateresults; (ii) rotating the result of operation (i) by an amount given bythe other intermediate result; and (iii) applying an element of a secretkey array to the result of operation (ii).
 22. The apparatus of claim 13wherein the processor is further operative to store the plurality ofwords in a corresponding plurality of registers as part of operation(a), and to transpose the contents of the registers after performingoperation (e).
 23. The apparatus of claim 13 wherein the processor isfurther operative to pre-whiten at least a subset of the plurality ofwords, before performing operation (a), by applying an element of asecret key array to the subset.
 24. The apparatus of claim 13 whereinthe processor is further operative to post-whiten at least a subset ofthe plurality of words after performing operation (f), by applying anelement of a secret key array to the subset.
 25. A computer-readablemedium for storing one or more programs for encrypting a plaintextmessage, wherein the one or more programs when executed implement thesteps of: (a) segmenting the plaintext message into a plurality ofwords; (b) applying an integer multiplication function to at least oneof the words; (c) rotating a value which is based on the result of theapplying step (b) by a first number of bits; (d) rotating a value whichis based on the result of the rotating step (c) by a second number ofbits derived from another one of the words; (e) applying a secret key toa value which is based on the result of step (d); and (f) repeatingsteps (b), (c), (d) and (e) for a designated number of rounds.
 26. Amethod of decrypting a ciphertext message, comprising the steps of: (a)segmenting the ciphertext message into a plurality of words; (b)applying an integer multiplication function to at least one of thewords; (c) rotating a value which is based on the result of the applyingstep (b) by a first number of bits; (d) rotating a value which is basedon the result of the rotating step (c) by a second number of bitsderived from another one of the words; (e) applying a secret key to avalue which is based on the result of step (d); and (f) repeating steps(b), (c), (d) and (e) for a designated number of rounds.
 27. Anapparatus for decrypting a ciphertext message, comprising: a memory forstoring at least a portion of a secret key; and a processor associatedwith the memory, wherein the processor is operative: (a) to segment theciphertext message into a plurality of words; (b) to apply an integermultiplication function to at least one of the words; (c) to rotate avalue which is based on the result of operation (b) by a first number ofbits; (d) to rotate a value which is based on the result of operation(c) by a second number of bits derived from another one of the words;(e) to apply the portion of the secret key to a value which is based ona result of operation (d); and (f) to repeat operations (b), (c), (d)and (e) for a designated number of rounds.